DataFrame Changes

record_df_info

pywrangle.df_changes.record_df_info.record_df_info(df: DataFrame, name: Union[str, int] = None) → dict

Returns dict with information about DataFrame, including name, cols, rows, and size.

Parameters
  • df (DataFrame) – DataFrame to record information from.

  • name (Union[str, int], optional) – Name of the DataFrame for comparison. Defaults to None.

Returns

Contains information about DataFrame.

Return type

dict

Notes

  • This function allows users change a DataFrame while recording its previous state.

    • For instance, after filtering a DataFrame, you may compare the two DataFrames using the print_df_info function.

Example

>>> df = create_df.create_int_df_size(cols= 10, rows= 20)
>>> df_info = pw.record_df_info(df)
>>> print(df_info)
{'name': None, 'cols': 10, 'rows': 20, 'size': 200}

print_df_info

pywrangle.df_changes.print_df_info.print_df_info(*args: List[Union[df, dict]], compare_dfs: bool = True, compare_base_df: int = 0, compare_end_df: int = - 1, abs_comparison: bool = True, relative_comparison: bool = True) → None

Prints DataFrame information from args.

Args may include either be either pd.DataFrame or a dict returned from the record_df_info function.

Parameters
  • args (List[ Union['df', dict]]) – List of DataFrames & dicts to print information.

  • compare_dfs (bool, optional) – Show the difference between 2 DataFrames. May show absolute and relative differences. Defaults to True.

  • compare_base_df (int) – Index of base DataFrame for comparison. Defaults to 0.

  • compare_end_df (int) – Index of DataFrame to compare to base. Defaults to -1.

  • abs_comparison (bool) – If should show absolute comparison between DataFrames. Defaults to True.

  • relative_comparison (bool) – If should show relative comparison between DataFrames. Defaults to True.

Notes

  • DataFrames are assigned a name based on the index that they are passed into args.

  • Relative (%) difference is calculated as total of base df.

Example

>>> df1, df2 = (create_df.create_int_df_size(cols= i * 10, rows= i * 20) for i in range(1, 3))
>>> pw.print_df_info(df2, df1)

Name       |    Cols   |    Rows   |    Size
--------   |   -----   |   -----   |   -----
0          |      20   |      40   |     800
1          |      10   |      20   |     200
Abs Diff   |     -10   |     -20   |    -600
% Diff     |   -50.0   |   -50.0   |   -75.0

Compared indices -1 & 0